Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions. This paper considers a more realistic yet more challenging scenario,namely Single Domain Generalization (Single-DG), where only a single source domain is available for training. To tackle this challenge, we first try to understand when neural networks fail to generalize? We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity". Based on our analysis, we propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies. Models trained with these hard-to-learn samples can effectively suppress the sensitivity in the frequency space, which leads to improved generalization performance. Extensive experiments on multiple public datasets demonstrate the superiority of our approach, which surpasses the state-of-the-art single-DG methods.
translated by 谷歌翻译
回归在估计各种临床风险或测量评分的许多医学成像应用中起着至关重要的作用。尽管已经研究了医学图像分类任务中深层神经网络的培训策略和损失功能,但回归任务的选项非常有限。关键挑战之一是,很难解释由现有流行损失函数(如平方误差或L1损失)所学的高维特征表示。在本文中,我们提出了一种新颖的回归度量损失(RM-loss),该损失通过找到标签空间等均衡的表示歧管来赋予表示空间的语义含义。对两个回归任务的实验,即冠状动脉钙评分估计和骨骼年龄评估,表明RM-LOSS优于在性能和可解释性上的现有流行回归损失。代码可在https://github.com/dial-rpi/regression-metric-loss上找到。
translated by 谷歌翻译
Nonconvex-nonconcave minimax optimization has been the focus of intense research over the last decade due to its broad applications in machine learning and operation research. Unfortunately, most existing algorithms cannot be guaranteed to converge and always suffer from limit cycles. Their global convergence relies on certain conditions that are difficult to check, including but not limited to the global Polyak-\L{}ojasiewicz condition, the existence of a solution satisfying the weak Minty variational inequality and $\alpha$-interaction dominant condition. In this paper, we develop the first provably convergent algorithm called doubly smoothed gradient descent ascent method, which gets rid of the limit cycle without requiring any additional conditions. We further show that the algorithm has an iteration complexity of $\mathcal{O}(\epsilon^{-4})$ for finding a game stationary point, which matches the best iteration complexity of single-loop algorithms under nonconcave-concave settings. The algorithm presented here opens up a new path for designing provable algorithms for nonconvex-nonconcave minimax optimization problems.
translated by 谷歌翻译
The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. {Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called \textit{phase synchronization} problem.} We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin.
translated by 谷歌翻译
NonConvex-Concave Minimax优化已经对机器学习产生了浓厚的兴趣,包括对数据分配具有稳健性,以非解释性损失,对抗性学习为单一的学习。然而,大多数现有的作品都集中在梯度散发性(GDA)变体上,这些变体只能在平滑的设置中应用。在本文中,我们考虑了一个最小问题的家族,其目标功能在最小化变量中享有非平滑复合结构,并且在最大化的变量中是凹入的。通过充分利用复合结构,我们提出了平滑的近端线性下降上升(\ textit {平滑} plda)算法,并进一步建立了其$ \ Mathcal {o}(\ epsilon^{ - 4})在平滑设置下,平滑的gda〜 \ cite {zhang2020single}。此外,在一个温和的假设下,目标函数满足单方面的kurdyka- \ l {} ojasiewicz条件,带有指数$ \ theta \ in(0,1)$,我们可以进一步将迭代复杂性提高到$ \ MATHCAL {O }(\ epsilon^{ - 2 \ max \ {2 \ theta,1 \}})$。据我们所知,这是第一种非平滑nonconvex-concave问题的可证明有效的算法,它可以实现最佳迭代复杂性$ \ MATHCAL {o}(\ epsilon^{ - 2})$,如果$ \ theta \ 0,1/2] $。作为副产品,我们讨论了不同的平稳性概念并定量澄清它们的关系,这可能具有独立的兴趣。从经验上,我们说明了拟议的平滑PLDA在变体正规化WassErstein分布在鲁棒优化问题上的有效性。
translated by 谷歌翻译
近年来,基于脑电图的情绪识别的进步已受到人机相互作用和认知科学领域的广泛关注。但是,如何用有限的标签识别情绪已成为一种新的研究和应用瓶颈。为了解决这个问题,本文提出了一个基于人类中刺激一致的脑电图信号的自我监督组减数分裂对比学习框架(SGMC)。在SGMC中,开发了一种新型遗传学启发的数据增强方法,称为减数分裂。它利用了组中脑电图样品之间的刺激对齐,通过配对,交换和分离来生成增强组。该模型采用组投影仪,从相同的情感视频刺激触发的脑电图样本中提取组级特征表示。然后,使用对比度学习来最大程度地提高具有相同刺激的增强群体的组级表示的相似性。 SGMC在公开可用的DEAP数据集上实现了最先进的情感识别结果,其价值为94.72%和95.68%的价和唤醒维度,并且在公共种子数据集上的竞争性能也具有94.04的竞争性能。 %。值得注意的是,即使使用有限的标签,SGMC也会显示出明显的性能。此外,功能可视化的结果表明,该模型可能已经学习了与情感相关的特征表示,以改善情绪识别。在超级参数分析中进一步评估了组大小的影响。最后,进行了对照实验和消融研究以检查建筑的合理性。该代码是在线公开提供的。
translated by 谷歌翻译
In this paper, we study the design and analysis of a class of efficient algorithms for computing the Gromov-Wasserstein (GW) distance tailored to large-scale graph learning tasks. Armed with the Luo-Tseng error bound condition~\citep{luo1992error}, two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) enjoy the convergence guarantees. Upon task-specific properties, our analysis further provides novel theoretical insights to guide how to select the best-fit method. As a result, we are able to provide comprehensive experiments to validate the effectiveness of our methods on a host of tasks, including graph alignment, graph partition, and shape matching. In terms of both wall-clock time and modeling performance, the proposed methods achieve state-of-the-art results.
translated by 谷歌翻译
网络研究中最根本的问题之一是社区检测。随机块模型(SBM)是一种流行的模型,具有不同的估计方法,其社区检测一致性结果揭晓。但是,SBM受到强烈假设的限制:同一社区中的所有节点在随机上都是等效的,这可能不适合实际应用。我们引入了成对协变量调整后的随机块模型(PCABM),这是SBM的概括,该模型包含成对协变量信息。我们研究协变量和社区分配系数的最大似然估计。结果表明,在适当的稀疏条件下,协变量和社区分配的系数估计均一致。引入了带有调节的光谱聚类(SCWA),以有效地求解PCABM。在某些条件下,我们得出了SCWA下社区检测的错误限制,并表明它是社区检测一致的。此外,研究了模型的选择,并研究了成对协变量的特征选择,并提出了两种相应的算法。当可访问协变量信息时,PCABM与SBM或学位校正的随机块模型(DCBM)进行比较。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译